Update ASI01_Agent_Behaviour_Hijack .md #721

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

kayunder wants to merge 1 commit into OWASP:main from kayunder:patch-3

kayunder commented Sep 21, 2025

Proposed draft for Agent Behavior Hijacking.

[Title of Your PR]

Key Changes:

List major changes and core updates
Keep each line under 80 characters
Focus on the "what" and "why"

Added:

New features/functionality
New files/configurations
New dependencies

Changed:

Updates to existing code
Configuration changes
Dependency updates

Removed:

Deleted files/code
Removed dependencies
Cleaned up configurations


          Update ASI01_Agent_Behaviour_Hijack .md

12908b4

Proposed draft for Agent Behavior Hijacking.

Signed-off-by: kayunder <[email protected]>

kayunder requested review from guerilla7, hoeg and itskerenkatz as code owners

September 21, 2025 14:31

itskerenkatz requested changes

View reviewed changes

Collaborator

itskerenkatz left a comment

Loved it! super concise and yet detailed and practical!
I have added some comments and thoughts

...tiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI01_Agent_Behaviour_Hijack .md

    
              **Description:**

              A brief description of the vulnerability that includes its potential effects such as system compromises, data breaches, or other security concerns.

              AI Agents require autonomous ability to plan and execute tasks to achieve a goal. The independently initiated chain of events that occur throughout the activity path of an Agent can often be described as the “behavior” of the agent. Due to the overall weakness in the agent’s ability to determine its own instructions against the possible nefarious injection of instructions that would direct the agent to behave in an unintended way, the intended behavior of the agent is susceptible to manipulation. This inherent weakness is due to the use of natural language processing within the AI components of the agent.

Collaborator

itskerenkatz Sep 22, 2025

I think it's not only due to the language processing concept, but also due to the methods in which the alignment and RLHF processes are being done, right?

...tiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI01_Agent_Behaviour_Hijack .md

    
              A brief description of the vulnerability that includes its potential effects such as system compromises, data breaches, or other security concerns.

              AI Agents require autonomous ability to plan and execute tasks to achieve a goal. The independently initiated chain of events that occur throughout the activity path of an Agent can often be described as the “behavior” of the agent. Due to the overall weakness in the agent’s ability to determine its own instructions against the possible nefarious injection of instructions that would direct the agent to behave in an unintended way, the intended behavior of the agent is susceptible to manipulation. This inherent weakness is due to the use of natural language processing within the AI components of the agent.

              The OWASP LLM01:2025 Prompt Injection risk is also highly relevant to Agent Behavior Hijacking. Prompt injection occurs when malicious input alters an LLM’s behavior or output in unintended ways. For an autonomous AI agent, a well-crafted prompt injection can override system instructions, tricking the agent into taking harmful actions, disclosing secrets, or executing commands that were never intended. In this way, prompt injection directly facilitates the hijacking of an agent’s decision-making process, making it one of the most potent enablers of Agent Behavior Hijacking.

Collaborator

itskerenkatz Sep 22, 2025

An interesting question to me is wether or not we want to target or even mention that this could be either prompt injection that has a specific target or harm to achieve, or a jailbreak that aims to completely override the agent's guardrails (I think it is quite common to distinguish between the two and it may be helpful for the readers), what do you think? I do agree that at the end of the day these are two manipulations results in harmful risky consequences, but maybe worth mention it?

...tiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI01_Agent_Behaviour_Hijack .md

    
              The OWASP LLM01:2025 Prompt Injection risk is also highly relevant to Agent Behavior Hijacking. Prompt injection occurs when malicious input alters an LLM’s behavior or output in unintended ways. For an autonomous AI agent, a well-crafted prompt injection can override system instructions, tricking the agent into taking harmful actions, disclosing secrets, or executing commands that were never intended. In this way, prompt injection directly facilitates the hijacking of an agent’s decision-making process, making it one of the most potent enablers of Agent Behavior Hijacking.

              Additionally, When correlated to the OWASP Agentic AI Threats and Mitigations Guide, there are a few threats that can be directly linked to Agent Behavior Hijacking. Specifically, T01 – Memory Poisoning, T02 – Tool Misuse, T06 – Goal Manipulation, and T07 – Misaligned & Deceptive Behaviors all describe scenarios where an attacker subverts an agent’s autonomy and decision-making.

Collaborator

itskerenkatz Sep 22, 2025

Loved it!

...tiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI01_Agent_Behaviour_Hijack .md

    
              2. Example 2: Another instance or type of this vulnerability.

              3. Example 3: Yet another instance or type of this vulnerability.

              1. Example 1: Indirect Prompt Injection via hidden instruction payloads embedded in web pages or documents silently redirect an agent to exfiltrate sensitive data or misuse connected tools.

              2. Example 2: Indirect Prompt Injection via email hijacks an agent’s internal mail capability, sending unauthorized messages under a trusted identity.

Collaborator

itskerenkatz Sep 22, 2025

Maybe add: "an email sent from outside of the organization" to emphasize how easy it is to preform the attack from outside of the company as an attacker

...tiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI01_Agent_Behaviour_Hijack .md

    
              3. Example 3: Yet another instance or type of this vulnerability.

              1. Example 1: Indirect Prompt Injection via hidden instruction payloads embedded in web pages or documents silently redirect an agent to exfiltrate sensitive data or misuse connected tools.

              2. Example 2: Indirect Prompt Injection via email hijacks an agent’s internal mail capability, sending unauthorized messages under a trusted identity.

              3. Example 3: System Prompt Override manipulates core instructions to reorient the agent’s objectives toward attacker-defined outcomes.

Collaborator

itskerenkatz Sep 22, 2025

How about we'll be even more specific in here?
I think if we can link to a specific attack pattern that will be more feasible for the readers it might be helpful.
Also - I think mentioning the outcome of hijacking a workflow is important in here!
Just an example can be "CEO injection attack", or a clients serving bot refunding a user with much greater refund etc. what do you think?

...tiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI01_Agent_Behaviour_Hijack .md

    
              1. Prevention Step 1: A step or strategy that can be used to prevent the vulnerability or mitigate its effects.

              2. Prevention Step 2: Another prevention step or strategy.

              3. Prevention Step 3: Yet another prevention step or strategy.

              1. Prevention Step 1: Establish continuous monitoring of agent activity throughout the chain of actions to build a known baseline of behavior. This baseline will allow for alerts to be triggered when the behavior of the agent strays from the established historical pattern.

Collaborator

itskerenkatz Sep 22, 2025

I am loving it.
I think not only from historical pattern but also from intended goal, right?

...tiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI01_Agent_Behaviour_Hijack .md

    
              2. Prevention Step 2: Another prevention step or strategy.

              3. Prevention Step 3: Yet another prevention step or strategy.

              1. Prevention Step 1: Establish continuous monitoring of agent activity throughout the chain of actions to build a known baseline of behavior. This baseline will allow for alerts to be triggered when the behavior of the agent strays from the established historical pattern.

              2. Prevention Step 2: Incorporate AI Agents into the established Insider Threat Program to monitor the behavior against established baselines and allow for investigation in case of outlier activity.

Collaborator

itskerenkatz Sep 22, 2025

Maybe also:
Surface any insider prompts intended to get access to sensitive data or to alter the agent behavior?

Collaborator

itskerenkatz Sep 22, 2025

I think another super interesting topic that is not discussed enough: how about when users are doing a reconnaissance to eventually perform an adversarial? Asking lots of questions about the agents goals and boundaries to eventually attack it - I think it worth looking for it too as a preventive mitigation.

...tiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI01_Agent_Behaviour_Hijack .md

    
              3. Prevention Step 3: Yet another prevention step or strategy.

              1. Prevention Step 1: Establish continuous monitoring of agent activity throughout the chain of actions to build a known baseline of behavior. This baseline will allow for alerts to be triggered when the behavior of the agent strays from the established historical pattern.

              2. Prevention Step 2: Incorporate AI Agents into the established Insider Threat Program to monitor the behavior against established baselines and allow for investigation in case of outlier activity.

              3. Prevention Step 3: Prevention Step 3: Ensure the weights of the goals in the Agent system prompt are balanced accurately to ensure the behavior of the agent will adhere to the intent of the builders. This will help in identifying possible issues.

Collaborator

itskerenkatz Sep 22, 2025

This will help to prevent some of the issues :)

...tiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI01_Agent_Behaviour_Hijack .md

    
              Scenario #1: EchoLeak — Zero-Click Indirect Prompt Injection - An attacker emails a crafted message that silently triggers Microsoft 365 Copilot to execute hidden instructions, causing the AI to exfiltrate confidential emails, files, and chat logs without any user interaction.

              Scenario #2: Another example of an attack scenario showing a different way the vulnerability could be exploited.

              Scenario #2: Operator Prompt Injection via Web Content - An attacker plants malicious content on a web page that the Operator agent processes, tricking it into following unauthorized instructions. The Operator agent then accesses authenticated internal pages and exposes users’ private data, demonstrating how lightly guarded autonomous agents can leak sensitive information through prompt injection.

Collaborator

itskerenkatz Sep 22, 2025

I think we want to add one around workflow hijacking
My folks have created the CEO injection attack that really explains it, but they just posted it on Linkedin or something.. I can ask them to get it documented somewhere so we can refer to it because I think it's really an important message - or of course if you have any other example or demonstration to it I think it'll be great!

Contributor

almogbhl Oct 1, 2025 •

edited

Loading

@itskerenkatz @kayunder can this be useful here?

Visual Studio Code & Agentic AI workflows RCE – Sep 2025	Command injection in agentic AI workflows can let a remote, unauthenticated attacker cause VS Code to run injected commands on the developer’s machine.	ASI01+ASI02+ASI05

Google Gemini Trifecta — Cloud Assist, Search Model & Browsing (Sep 2025)	Indirect prompt injection through logs, search history, and browsing context can trick Gemini into exposing sensitive data and carrying out unintended actions across connected Google services.	ASI01+ASI02

https://www.tenable.com/blog/the-trifecta-how-three-new-gemini-vulnerabilities-in-cloud-assist-search-model-and-browsing

cjj884 Nov 5, 2025 •

edited

Loading

Recommend scenario title be "Operator Indirect Prompt Injection via Web Content" reflecting indirect nature of injection and consistent with example provided above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

itskerenkatz itskerenkatz requested changes

guerilla7 Awaiting requested review from guerilla7 guerilla7 is a code owner

hoeg Awaiting requested review from hoeg hoeg is a code owner

+2 more reviewers

almogbhl almogbhl left review comments

cjj884 cjj884 left review comments

Requested changes must be addressed to merge this pull request.

Labels

None yet